Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads

نویسندگان

  • Tobias Mourier
  • Sarah Mollerup
  • Lasse Vinner
  • Thomas Arn Hansen
  • Kristín Rós Kjartansdóttir
  • Tobias Guldberg Frøslev
  • Torsten Snogdal Boutrup
  • Lars Peter Nielsen
  • Eske Willerslev
  • Anders J. Hansen
چکیده

From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads are derived. First, we showed by simulations that we can robustly infer the level of genetic diversity from short sequence reads. Second, we find that the measures of nucleotide diversity inferred from our retroviral sequences significantly exceed the level observed from Human Immunodeficiency Virus infections, prompting us to conclude that the novel retroviruses are both of endogenous origin. Through further simulations, we rule out the possibility that the observed elevated levels of nucleotide diversity are the result of co-infection with two closely related exogenous retroviruses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assignment of Endogenous Retrovirus Integration Sites Using a Mixture Model

Structural variation occurs in the genomes of individuals because of the different positions occupied by repetitive genome elements like endogenous retroviruses, or ERVs. The presence or absence of ERVs can be determined by identifying the junction with the host genome using high-throughput sequence technology and a clustering algorithm. The resulting data give the number of sequence reads assi...

متن کامل

Similarity thresholds used in DNA sequence assembly from short reads can reduce the comparability of population histories across species

Comparing inferences among datasets generated using short read sequencing may provide insight into the concerted impacts of divergence, gene flow and selection across organisms, but comparisons are complicated by biases introduced during dataset assembly. Sequence similarity thresholds allow the de novo assembly of short reads into clusters of alleles representing different loci, but the result...

متن کامل

Read Clouds Uncover Variation in Complex Regions of the Human Genome

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient leng...

متن کامل

Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data

MOTIVATION Haplotypes, defined as the sequence of alleles on one chromosome, are crucial for many genetic analyses. As experimental determination of haplotypes is extremely expensive, haplotypes are traditionally inferred using computational approaches from genotype data, i.e. the mixture of the genetic information from both haplotypes. Best performing approaches for haplotype inference rely on...

متن کامل

Endogenous Retroviruses and the Human Genome: Implications for Human Disease

Introduction Lifecycle and Genome Organization Endogenization Process and Host-Virus Evolution Classification of HERVs HERV Manipulation of Host Function Host Defenses against HERVs Cancer Multiple Sclerosis Schizophrenia Azoospermia Conclusions Sources Abstract Human endogenous retroviruses (HERVs) constitute approximately 8% of the human genome (16, 22, 39, 55). These ancient retroviruses rep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2015